11 research outputs found

    A Collaborative Approach to Computational Reproducibility

    Full text link
    Although a standard in natural science, reproducibility has been only episodically applied in experimental computer science. Scientific papers often present a large number of tables, plots and pictures that summarize the obtained results, but then loosely describe the steps taken to derive them. Not only can the methods and the implementation be complex, but also their configuration may require setting many parameters and/or depend on particular system configurations. While many researchers recognize the importance of reproducibility, the challenge of making it happen often outweigh the benefits. Fortunately, a plethora of reproducibility solutions have been recently designed and implemented by the community. In particular, packaging tools (e.g., ReproZip) and virtualization tools (e.g., Docker) are promising solutions towards facilitating reproducibility for both authors and reviewers. To address the incentive problem, we have implemented a new publication model for the Reproducibility Section of Information Systems Journal. In this section, authors submit a reproducibility paper that explains in detail the computational assets from a previous published manuscript in Information Systems

    Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar

    Get PDF
    Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre-trained model which generalizes from many different datasets and similar tasks. Our results demonstrate improved performance compared with our earlier work and existing methods on AutoML benchmark datasets for classification and regression tasks. In the spirit of reproducible research we make our data, models, and code publicly available.Comment: ICML Workshop on Automated Machine Learnin

    AlphaD3M: An Open-Source AutoML Library for Multiple ML Tasks

    Get PDF
    peer reviewedWe present AlphaD3M, an open-source Python library that supports a wide range of machine learning tasks over different data types. We discuss the challenges involved in supporting multiple tasks and how AlphaD3M addresses them by combining deep reinforcement learning and meta-learning to construct pipelines over a large collection of primitives effectively. To better integrate the use of AutoML within the data science lifecycle, we have built an ecosystem of tools around AlphaD3M that support user-in-the-loop tasks, including selecting suitable pipelines and developing custom solutions for complex problems. We present use cases that demonstrate some of these features. We report the results of a detailed experimental evaluation showing that AlphaD3M is effective and derives highquality pipelines for a diverse set of problems with performance comparable or superior to state-of-the-art AutoML systems

    AlphaD3M: Machine Learning Pipeline Synthesis

    Get PDF
    peer reviewedWe introduce AlphaD3M, an automatic machine learning (AutoML) system based on meta reinforcement learning using sequence models with self play. AlphaD3M is based on edit operations performed over machine learning pipeline primitives providing explainability. We compare AlphaD3M with state-of-the-art AutoML systems: Autosklearn, Autostacker, and TPOT, on OpenML datasets. AlphaD3M achieves competitive performance while being an order of magnitude faster, reducing computation time from hours to minutes, and is explainable by design

    ReproZip: 1.0.8

    No full text
    Behavior changes: No longer default to overwriting trace directories. ReproZip will ask what to do or exit with an error if one of --continue/--overwrite is not provided Bugfixes: Fix an issue identifying Debian packages when a file's in two packages Fix Python error Mixing iteration and read methods would lose data Fix reprounzip info showing some numbers as 0 instead of hiding them in non-verbose mode Another fix to X server IP determination for Docker Enhancements: New GUI for reprounzip, allowing one to unpack without using the command-line Add filters to remove some common files types from packed files (.pyc) or detected input files (.py, .so, ...) Add JSON output format to reprounzip info Allow using the Virtualbox display to reproduce X11-enabled experiments Downloads: reprozip (tarball) reprounzip (wheel, tarball) reprounzip-docker (wheel, tarball) reprounzip-vagrant (wheel, tarball) reprounzip-vistrails (wheel, tarball) reprounzip-qt 0.1 (wheel, tarball) Windows installer (Python 2.7, reprounzip, plugins and GUI) Mac Installer (Python 2.7, reprounzip, plugins and GUI

    uvcdat: UV-CDAT 2.6

    No full text
    The UV-CDAT team is pleased to announce the release of UV-CDAT version 2.6. DOI Change log is here Many thanks to users, testers, and developers for helping UV-CDAT to reach this milestone. This is a bug fix release, we have fixed several major and minor bugs in version 2.6 and therefore we strongly recommend users upgrade their UV-CDAT installation. From this release on UV-CDAT is distributed via conda conda install -c uvcdat uvcdat or conda create -n uvcdat-2.6 -c uvcdat uvcdat We also alert users to an Askbot website to help the UV-CDAT user community. This supports version 2.2 onward. See: http://uvcdat.askbot.co
    corecore